Page 1 of 1

2024 Journal article Open Access

Cascaded transformer-based networks for Wikipedia large-scale image-caption matching
Messina N., Coccomini D. A., Esuli A., Falchi F.
With the increasing importance of multimedia and multilingual data in online encyclopedias, novel methods are needed to fill domain gaps and automatically connect different modalities for increased accessibility. For example,Wikipedia is composed of millions of pages written in multiple languages. Images, when present, often lack textual context, thus remaining conceptually floating and harder to find and manage. In this work, we tackle the novel task of associating images from Wikipedia pages with the correct caption among a large pool of available ones written in multiple languages, as required by the image-caption matching Kaggle challenge organized by theWikimedia Foundation.Asystem able to perform this task would improve the accessibility and completeness of the underlying multi-modal knowledge graph in online encyclopedias. We propose a cascade of two models powered by the recent Transformer networks able to efficiently and effectively infer a relevance score between the query image data and the captions. We verify through extensive experiments that the proposed cascaded approach effectively handles a large pool of images and captions while maintaining bounded the overall computational complexity at inference time.With respect to other approaches in the challenge leaderboard,we can achieve remarkable improvements over the previous proposals (+8% in nDCG@5 with respect to the sixth position) with constrained resources. The code is publicly available at https://tinyurl.com/wiki-imcap.Source: Multimedia tools and applications (2024). doi:10.1007/s11042-023-17977-0
DOI: 10.1007/s11042-023-17977-0
Project(s): AI4Media via OpenAIRE

Metrics:

See at: link.springer.com Open Access | ISTI Repository | CNR ExploRA

2023 Journal article Open Access

On the generalization of Deep Learning models in video deepfake detection
Coccomini D. A., Caldelli R., Falchi F., Gennaro C.
The increasing use of deep learning techniques to manipulate images and videos, commonly referred to as "deepfakes", is making it more challenging to differentiate between real and fake content, while various deepfake detection systems have been developed, they often struggle to detect deepfakes in real-world situations. In particular, these methods are often unable to effectively distinguish images or videos when these are modified using novel techniques which have not been used in the training set. In this study, we carry out an analysis of different deep learning architectures in an attempt to understand which is more capable of better generalizing the concept of deepfake. According to our results, it appears that Convolutional Neural Networks (CNNs) seem to be more capable of storing specific anomalies and thus excel in cases of datasets with a limited number of elements and manipulation methodologies. The Vision Transformer, conversely, is more effective when trained with more varied datasets, achieving more outstanding generalization capabilities than the other methods analysed. Finally, the Swin Transformer appears to be a good alternative for using an attention-based method in a more limited data regime and performs very well in cross-dataset scenarios. All the analysed architectures seem to have a different way to look at deepfakes, but since in a real-world environment the generalization capability is essential, based on the experiments carried out, the attention-based architectures seem to provide superior performances.Source: JOURNAL OF IMAGING 9 (2023). doi:10.3390/jimaging9050089
DOI: 10.3390/jimaging9050089
DOI: 10.20944/preprints202303.0161.v1
Project(s): AI4Media via OpenAIRE

Metrics:

See at: doi.org Open Access | Journal of Imaging | www.mdpi.com | CNR ExploRA

2023 Conference article Restricted

Improving query and assessment quality in text-based interactive video retrieval evaluation
Bailer W., Arnold R., Benz V., Coccomini D., Gkagkas A., Þór Guðmundsson G., Heller S., Þór Jónsson B., Lokoc J., Messina N., Pantelidis N., Wu J.
Different task interpretations are a highly undesired element in interactive video retrieval evaluations. When a participating team focuses partially on a wrong goal, the evaluation results might become partially misleading. In this paper, we propose a process for refining known-item and open-set type queries, and preparing the assessors that judge the correctness of submissions to open-set queries. Our findings from recent years reveal that a proper methodology can lead to objective query quality improvements and subjective participant satisfaction with query clarity.Source: ICMR '23: International Conference on Multimedia Retrieval, pp. 597–601, Thessaloniki, Greece, 12-15/06/2023
DOI: 10.1145/3591106.3592281
Project(s): AI4Media via OpenAIRE

Metrics:

See at: dl.acm.org Restricted | CNR ExploRA

2023 Conference article Open Access

AIMH Lab approaches for deepfake detection
Coccomini D. A., Caldelli R., Esuli A., Falchi F., Gennaro C., Messina N., Amato G.
The creation of highly realistic media known as deepfakes has been facilitated by the rapid development of artificial intelligence technologies, including deep learning algorithms, in recent years. Concerns about the increasing ease of creation and credibility of deepfakes have then been growing more and more, prompting researchers around the world to concentrate their efforts on the field of deepfake detection. In this same context, researchers at ISTI-CNR's AIMH Lab have conducted numerous researches, investigations and proposals to make their own contribution to combating this worrying phenomenon. In this paper, we present the main work carried out in the field of deepfake detection and synthetic content detection, conducted by our researchers and in collaboration with external organizations.Source: Ital-IA 2023, pp. 432–436, Pisa, Italy, 29-31/05/2023
Project(s): AI4Media via OpenAIRE

See at: ceur-ws.org Open Access | ISTI Repository | CNR ExploRA

2023 Contribution to conference Open Access

Deepfake detection: challenges and solutions
Coccomini D. A.
Deepfakes can have a serious impact on the spread of fake news and on people's lives in general, becoming every day more dangerous. Moderation of online content and databases is vital to mitigate this phenomenon but the development of systems to distinguish between fake and genuine content comes with its own challenges: (a) The lack of generalization capabilities, due to the fact that most deepfake detection models are trained on a specific type of deepfake and struggle to detect deepfakes generated using different techniques. (b) When applying the deepfake detectors to the real world many peculiarities may occur; for example, the management of videos in which there are multiple people in the same scene or the recognition of the faces' movements towards or backwards the camera. In the analysis work we conducted, we started focusing on the generalization problem, trying to understand whether a particular deep learning architecture was more capable of abstracting the concept of deepfake to such an extent that it could detect images or videos that had been manipulated even with novel techniques. In [2] and [5] we compared Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) of various kinds by putting them in a cross-forgery context revealing the superiority of the ViTs, which are less tied to the specific anomalies they see during training. After that, noting a scarcity of methods based on ViT and even more so those based on hybrid architectures, we developed our first real deepfake detector. In [1] we created a new architecture, combining an EfficientNet-B0 and Cross ViT, which we have named Convolutional Cross Vision Transformer. Thanks to the local-global attention mechanism within it and the exploitation of features extracted from the CNN, the model was able to effectively detect deepfake videos, achieving SOTA results on DFDC[6] and FaceForensics++[9] dataset, all while keeping the number of parameters low. The model was also used to participate in the competition presented in [7]. In [4] we designed a new type of Convolutional TimeSformer that take into account both the spatial position of faces in the frame and their temporal position in the video. It is also capable of managing multiple identities and being robust to face-size movements thanks to the introduction of a novel attention mechanism and positional embedding. Our method surpassed the SOTA on in-dataset tests on [8] and performed robustly in real-world situations. Future work will mainly focus on improving deepfake detectors in order to make them more robust to other real-world problems. We also want to make detectors capable of combining information also of a textual nature, context, and the reputation of the account disseminating it, to understand video veracity. Also, as we started doing in [3], we will work on the more generic problem of synthetic content detection.Source: SEBD 2023 - 31st Symposium on Advanced Database Systems, pp. 688–689, Galzignano Terme, Italy, 2-5/07/2023

See at: ceur-ws.org Open Access | ISTI Repository | CNR ExploRA

2023 Report Open Access

AIMH Research Activities 2023
Aloia N., Amato G., Bartalesi V., Bianchi L., Bolettieri P., Bosio C., Carraglia M., Carrara F., Casarosa V., Ciampi L., Coccomini D. A., Concordia C., Corbara S., De Martino C., Di Benedetto M., Esuli A., Falchi F., Fazzari E., Gennaro C., Lagani G., Lenzi E., Meghini C., Messina N., Molinari A., Moreo A., Nardi A., Pedrotti A., Pratelli N., Puccetti G., Rabitti F., Savino P., Sebastiani F., Sperduti G., Thanos C., Trupiano L., Vadicamo L., Vairo C., Versienti L.
The AIMH (Artificial Intelligence for Media and Humanities) laboratory is dedicated to exploring and pushing the boundaries in the field of Artificial Intelligence, with a particular focus on its application in digital media and humanities. This lab's objective is to enhance the current state of AI technology particularly on deep learning, text analysis, computer vision, multimedia information retrieval, multimedia content analysis, recognition, and retrieval. This report encapsulates the laboratory's progress and activities throughout the year 2023.Source: ISTI Annual Reports, 2023
DOI: 10.32079/isti-ar-2023/001
Metrics:

See at: ISTI Repository Open Access | CNR ExploRA

2022 Conference article Open Access

AIMH Lab for Trustworthy AI
Messina N., Carrara F., Coccomini D., Falchi F., Gennaro C., Amato G.
In this short paper, we report the activities of the Artificial Intelligence for Media and Humanities (AIMH) laboratory of the ISTI-CNR related to Trustworthy AI. Artificial Intelligence is becoming more and more pervasive in our society, controlling recommendation systems in social platforms as well as safety-critical systems like autonomous vehicles. In order to be safe and trustworthy, these systems require to be easily interpretable and transparent. On the other hand, it is important to spot fake examples forged by malicious AI generative models to fool humans (through fake news or deep-fakes) or other AI systems (through adversarial examples). This is required to enforce an ethical use of these powerful new technologies. Driven by these concerns, this paper presents three crucial research directions contributing to the study and the development of techniques for reliable, resilient, and explainable deep learning methods. Namely, we report the laboratory activities on the detection of adversarial examples, the use of attentive models as a way towards explainable deep learning, and the detection of deepfakes in social platforms.Source: Ital-IA 2020 - Workshop su AI Responsabile ed Affidabile, Online conference, 10/02/2022

See at: ISTI Repository Open Access | www.ital-ia2022.it | CNR ExploRA

2022 Conference article Open Access

AIMH Lab for Cybersecurity
Vairo C., Coccomini D. A., Falchi F., Gennaro C., Massoli F. V., Messina N., Amato G.
In this short paper, we report the activities of the Artificial Intelligence for Media and Humanities (AIMH) laboratory of the ISTI-CNR related to Cy-bersecurity. We discuss about our active research fields, their applications and challenges. We focus on face recognition and detection of adversarial examples and deep fakes. We also present our activities on the detection of persuasion techniques combining image and text analysis.Source: Ital-IA 2022 - Workshop su AI per Cybersecurity, 10/02/2022

See at: ISTI Repository Open Access | www.ital-ia2022.it | CNR ExploRA

2022 Conference article Open Access

Combining EfficientNet and vision transformers for video deepfake detection
Coccomini D. A., Messina N., Gennaro C., Falchi F.
Deepfakes are the result of digital manipulation to forge realistic yet fake imagery. With the astonishing advances in deep generative models, fake images or videos are nowadays obtained using variational autoencoders (VAEs) or Generative Adversarial Networks (GANs). These technologies are becoming more accessible and accurate, resulting in fake videos that are very difficult to be detected. Traditionally, Convolutional Neural Networks (CNNs) have been used to perform video deepfake detection, with the best results obtained using methods based on EfficientNet B7. In this study, we focus on video deep fake detection on faces, given that most methods are becoming extremely accurate in the generation of realistic human faces. Specifically, we combine various types of Vision Transformers with a convolutional EfficientNet B0 used as a feature extractor, obtaining comparable results with some very recent methods that use Vision Transformers. Differently from the state-of-the-art approaches, we use neither distillation nor ensemble methods. Furthermore, we present a straightforward inference procedure based on a simple voting scheme for handling multiple faces in the same video shot. The best model achieved an AUC of 0.951 and an F1 score of 88.0%, very close to the state-of-the-art on the DeepFake Detection Challenge (DFDC). The code for reproducing our results is publicly available here: https://tinyurl.com/cnn-vit-dfd.Source: ICIAP 2022 - 21st International Conference on Image Analysis and Processing, pp. 219–229, Lecce, Italy, 23-27/05/2022
DOI: 10.1007/978-3-031-06433-3_19
Metrics:

See at: ISTI Repository Open Access | doi.org Restricted | link.springer.com | CNR ExploRA

2022 Report Open Access

AIMH research activities 2022
Aloia N., Amato G., Bartalesi V., Benedetti F., Bolettieri P., Cafarelli D., Carrara F., Casarosa V., Ciampi L., Coccomini D. A., Concordia C., Corbara S., Di Benedetto M., Esuli A., Falchi F., Gennaro C., Lagani G., Lenzi E., Meghini C., Messina N., Metilli D., Molinari A., Moreo A., Nardi A., Pedrotti A., Pratelli N., Rabitti F., Savino P., Sebastiani F., Sperduti G., Thanos C., Trupiano L., Vadicamo L., Vairo C.
The Artificial Intelligence for Media and Humanities laboratory (AIMH) has the mission to investigate and advance the state of the art in the Artificial Intelligence field, specifically addressing applications to digital media and digital humanities, and taking also into account issues related to scalability. This report summarize the 2022 activities of the research group.Source: ISTI Annual reports, 2022
DOI: 10.32079/isti-ar-2022/002
Metrics:

See at: ISTI Repository Open Access | CNR ExploRA

2022 Journal article Open Access

The face deepfake detection challenge
Guarnera L., Giudice O., Guarnera F., Ortis A., Puglisi G., Paratore A., Bui L. M. Q., Fontani M., Coccomini D. A., Caldelli R., Falchi F., Gennaro C., Messina N., Amato G., Perelli G., Concas Sara, Cuccu C., Orru G., Marcialis G. L., Battiato S.
Multimedia data manipulation and forgery has never been easier than today, thanks to the power of Artificial Intelligence (AI). AI-generated fake content, commonly called Deepfakes, have been raising new issues and concerns, but also new challenges for the research community. The Deepfake detection task has become widely addressed, but unfortunately, approaches in the literature suffer from generalization issues. In this paper, the Face Deepfake Detection and Reconstruction Challenge is described. Two different tasks were proposed to the participants: (i) creating a Deepfake detector capable of working in an "in the wild" scenario; (ii) creating a method capable of reconstructing original images from Deepfakes. Real images from CelebA and FFHQ and Deepfake images created by StarGAN, StarGAN-v2, StyleGAN, StyleGAN2, AttGAN and GDWCT were collected for the competition. The winning teams were chosen with respect to the highest classification accuracy value (Task I) and "minimum average distance to Manhattan" (Task II). Deep Learning algorithms, particularly those based on the EfficientNet architecture, achieved the best results in Task I. No winners were proclaimed for Task II. A detailed discussion of teams' proposed methods with corresponding ranking is presented in this paper.Source: JOURNAL OF IMAGING 8 (2022). doi:10.3390/jimaging8100263
DOI: 10.3390/jimaging8100263
Metrics:

See at: Journal of Imaging Open Access | ISTI Repository | www.mdpi.com | CNR ExploRA

2022 Conference article Open Access

Cross-forgery analysis of vision transformers and CNNs for deepfake image detection
Coccomini D. A., Caldelli R., Falchi F., Gennaro C., Amato G.
Deepfake Generation Techniques are evolving at a rapid pace, making it possible to create realistic manipulated images and videos and endangering the serenity of modern society. The continual emergence of new and varied techniques brings with it a further problem to be faced, namely the ability of deepfake detection models to update themselves promptly in order to be able to identify manipulations carried out using even the most recent methods. This is an extremely complex problem to solve, as training a model requires large amounts of data, which are difficult to obtain if the deepfake generation method is too recent. Moreover, continuously retraining a network would be unfeasible. In this paper, we ask ourselves if, among the various deep learning techniques, there is one that is able to generalise the concept of deepfake to such an extent that it does not remain tied to one or more specific deepfake generation methods used in the training set. We compared a Vision Transformer with an EfficientNetV2 on a cross-forgery context based on the ForgeryNet dataset. From our experiments, It emerges that EfficientNetV2 has a greater tendency to specialize often obtaining better results on training methods while Vision Transformers exhibit a superior generalization ability that makes them more competent even on images generated with new methodologies.Source: MAD '22 - 1st International workshop on Multimedia AI against Disinformation, pp. 52–58, Newark, NY, USA, 27/06/2022
DOI: 10.1145/3512732.3533582
DOI: 10.48550/arxiv.2206.13829
Project(s): AI4Media via OpenAIRE

Metrics:

2021 Report Open Access

AIMH research activities 2021
Aloia N., Amato G., Bartalesi V., Benedetti F., Bolettieri P., Cafarelli D., Carrara F., Casarosa V., Coccomini D., Ciampi L., Concordia C., Corbara S., Di Benedetto M., Esuli A., Falchi F., Gennaro C., Lagani G., Massoli F. V., Meghini C., Messina N., Metilli D., Molinari A., Moreo A., Nardi A., Pedrotti A., Pratelli N., Rabitti F., Savino P., Sebastiani F., Sperduti G., Thanos C., Trupiano L., Vadicamo L., Vairo C.
The Artificial Intelligence for Media and Humanities laboratory (AIMH) has the mission to investigate and advance the state of the art in the Artificial Intelligence field, specifically addressing applications to digital media and digital humanities, and taking also into account issues related to scalability. This report summarize the 2021 activities of the research group.Source: ISTI Annual Report, ISTI-2021-AR/003, pp.1–34, 2021
DOI: 10.32079/isti-ar-2021/003
Metrics:

See at: ISTI Repository Open Access | CNR ExploRA